In this tutorial, we utilize the GSE154109 dataset, which contains single-cell RNA sequencing (scRNA-seq) data from acute myeloid leukemia (AML) patients, to demonstrate how dimension reduction enhances cytokine activity inference. Cell type annotations were obtained from the TISCH2 database.
import warnings
warnings.filterwarnings("ignore")
import scanpy as sc
import numpy as np
import pandas as pd
import anndata as ad
import scipy
from numpy import unique
import time
from FeaSc.util_mca import *
from FeaSc.pySaaSc import *
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import roc_curve, auc
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = [6, 4]
plt.rcParams['figure.dpi'] = 100adata = sc.read_h5ad("data/AML_GSE154109/AML_GSE154109.h5ad")
adata.obs.index = adata.obs.index.str.replace('@', '_')
adata.layers['counts'] = adata.X.copy()
adataAnnData object with n_obs × n_vars = 10799 × 17111
obs: 'UMAP_1', 'UMAP_2', 'Cluster', 'Celltype (malignancy)', 'Celltype (major-lineage)', 'Celltype (minor-lineage)', 'Patient', 'Sample', 'Tissue'
layers: 'counts'
obs = adata.obs
obs.index = obs.index.str.replace('@', '_')adata.obs['celltype'] = adata.obs['Celltype (major-lineage)'].astype('category')
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.filter_cells(adata, min_genes=200)gs = read_gmt('data/CD8_Tcells_activation.gmt')
grate = get_gene_rate(background_geneset='data/c2.cp.v2025.1.Hs.symbols.gmt', signature_geneset=gs, mode="multiple")
grate| background | signature | |
|---|---|---|
| A1BG | 0.001243 | 0.0 |
| A1CF | 0.000746 | 0.0 |
| A2M | 0.003231 | 0.0 |
| A3GALT2 | 0.000746 | 0.0 |
| A4GALT | 0.001740 | 0.0 |
| … | … | … |
| ZWILCH | 0.003231 | 0.0 |
| ZWINT | 0.003231 | 0.0 |
| ZYG11B | 0.000497 | 0.0 |
| ZYX | 0.001989 | 0.0 |
| ZZZ3 | 0.001243 | 0.0 |
13868 rows × 2 columns
To infer Cytokine signaling activity, a model matrix file that quantifies the relative changes in gene expression is required. Ridge regression is then applied, using gene expression as the dependent variable (Y) and the model matrix as the independent variable (X). The activity of each cytokine is quantified as the ratio of the regression coefficient to its standard error. This methodological approach was first introduced in CytoSig. FeaSc is capable of utilizing the model matrix file from CytoSig as well as those from alternative sources.
We aimed to demonstrate that dimension reduction techniques could improve the inference of cytokine activity. In this study, we implemented three distinct dimension reduction methods: Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), and Multiple Correspondence Analysis (MCA).
Benchmarking the performance of Feasc for cytokine activity inference presents significant challenges. It is widely accepted in the scientific community that CD8 T cell activation negatively correlates with TGFβ1 cytokine activity. This relationship has been previously utilized in the Tres method to identify immune-response related genes in T cells. Therefore, a stronger negative correlation between inferred TGFβ1 activity and CD8 T cell activation scores serves as an indicator of improved performance.
sc.pp.normalize_total(adata, target_sum=1e4, inplace=True)
sc.pp.log1p(adata)response_data = compute_response(adata, gene_rate=grate, celltype="CD8T", signature="Tcell_activation")
response_data| Tcell_activation | |
|---|---|
| Cell | |
| P1_AACCGCGGTGATGTGG-1 | -0.951382 |
| P1_AACGTTGGTGACTACT-1 | -2.726300 |
| P1_AAGACCTGTCTTTCAT-1 | -2.347866 |
| P1_ACTGTCCCACACGCTG-1 | -2.380796 |
| P1_AGAGTGGCAAGCGTAG-1 | -2.167345 |
| … | … |
| P8_TTTATGCTCGCCAAAT-1 | -1.363836 |
| P8_TTTATGCTCTCAACTT-1 | -2.808227 |
| P8_TTTGGTTCAATCGGTT-1 | -2.056600 |
| P8_TTTGGTTCACGGTAAG-1 | -2.253479 |
| P8_TTTGGTTTCATAGCAC-1 | -3.258121 |
715 rows × 1 columns
signaling_data = compute_signaling(
adata,
model_file="data/signature.centroid.expand",
celltype='CD8T',
cytokine=None,
lambd=10000,
num_permutations=0,
test_method="two-sided")
signaling_data| Activin A | BDNF | BMP2 | BMP4 | BMP6 | CD40L | CXCL12 | EGF | FGF2 | GCSF | … | PGE2 | TGFA | TGFB1 | TGFB2 | TGFB3 | TNFA | TRAIL | TWEAK | VEGFA | WNT3A | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| barcode | |||||||||||||||||||||
| P1_AACCGCGGTGATGTGG-1 | -0.094772 | -0.293682 | -0.286625 | 0.077783 | 0.321539 | 0.116699 | -0.144668 | -0.310968 | 1.021896 | -1.132794 | … | -0.924823 | 0.452272 | -0.386532 | 0.331252 | -0.301805 | 0.189115 | 1.498093 | 0.408660 | 0.558532 | -0.821857 |
| P1_AACGTTGGTGACTACT-1 | 0.126739 | 0.315829 | -0.230301 | -0.379226 | 0.598421 | 0.169795 | 0.568125 | -0.144612 | 1.183636 | -1.289984 | … | -1.073571 | 0.428590 | -0.362775 | -0.124503 | 0.291687 | 0.079039 | 1.463567 | 0.411479 | 0.365396 | -0.689990 |
| P1_AAGACCTGTCTTTCAT-1 | -0.089947 | -0.193842 | -0.100216 | -0.095971 | 0.927968 | -0.287058 | 0.299551 | -0.349758 | 1.653981 | -1.660125 | … | -1.409793 | 0.317874 | -0.481422 | -0.086316 | -0.104492 | 0.197171 | 1.128967 | 0.545923 | 0.446443 | -1.467653 |
| P1_ACTGTCCCACACGCTG-1 | 0.295517 | -0.241921 | -0.245211 | -0.163259 | 0.422830 | 0.007753 | 0.506617 | -0.212739 | 1.020806 | -0.995910 | … | -1.566678 | 0.093569 | -0.455467 | -0.024590 | -0.294104 | -0.009445 | 1.357538 | 0.279438 | 0.786081 | -0.863656 |
| P1_AGAGTGGCAAGCGTAG-1 | -0.011922 | -0.126875 | -0.053377 | -0.133999 | 0.066901 | 0.175580 | 0.506431 | -0.064762 | 0.922227 | -1.010561 | … | -1.298723 | 0.618760 | -0.378771 | 0.086789 | -0.083260 | -0.019148 | 0.693604 | 0.091681 | 0.427866 | -0.364654 |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
| P8_TTTATGCTCGCCAAAT-1 | -0.065202 | -0.228488 | -0.376433 | -0.057831 | 0.423303 | -0.128511 | 0.495901 | 0.019771 | 1.505175 | -1.124849 | … | -1.657007 | 0.215126 | -0.505389 | -0.146098 | -0.303647 | 0.053365 | 1.203349 | 0.107695 | 0.381438 | -0.620364 |
| P8_TTTATGCTCTCAACTT-1 | 0.129540 | 0.195420 | -0.336719 | -0.331438 | 0.673912 | -0.260759 | 0.546337 | 0.081230 | 1.139160 | -1.068861 | … | -1.367952 | 0.478208 | -0.515014 | -0.057076 | -0.206613 | 0.102047 | 1.359984 | 0.416357 | 0.448254 | -1.036446 |
| P8_TTTGGTTCAATCGGTT-1 | 0.332880 | -0.126464 | -0.275887 | -0.225908 | 0.591053 | -0.142498 | 0.963675 | -0.046590 | 1.022271 | -0.726532 | … | -1.043937 | 0.845506 | -0.448924 | -0.372296 | 0.035262 | 0.138179 | 1.515752 | 0.281205 | 1.034865 | -0.951830 |
| P8_TTTGGTTCACGGTAAG-1 | 0.219233 | -0.393705 | -0.091263 | -0.196145 | 0.891786 | 0.132119 | 0.818010 | 0.180460 | 1.217418 | -1.298357 | … | -0.430545 | 0.652221 | -0.479393 | -0.109512 | -0.081216 | 0.304150 | 1.533638 | 0.465998 | 1.190769 | -1.575603 |
| P8_TTTGGTTTCATAGCAC-1 | 0.625703 | 0.066704 | 0.232557 | 0.221179 | 0.287647 | 0.047044 | 1.100452 | 0.040469 | 1.422489 | -0.853484 | … | -1.356366 | 0.483683 | -0.276560 | -0.192638 | 0.019490 | 0.134252 | 1.408722 | 0.471552 | 0.564261 | -0.829856 |
715 rows × 51 columns
res = pd.concat([response_data, signaling_data[["TGFB1"]]], axis=1)
res| Tcell_activation | TGFB1 | |
|---|---|---|
| P1_AACCGCGGTGATGTGG-1 | -0.951382 | -0.386532 |
| P1_AACGTTGGTGACTACT-1 | -2.726300 | -0.362775 |
| P1_AAGACCTGTCTTTCAT-1 | -2.347866 | -0.481422 |
| P1_ACTGTCCCACACGCTG-1 | -2.380796 | -0.455467 |
| P1_AGAGTGGCAAGCGTAG-1 | -2.167345 | -0.378771 |
| … | … | … |
| P8_TTTATGCTCGCCAAAT-1 | -1.363836 | -0.505389 |
| P8_TTTATGCTCTCAACTT-1 | -2.808227 | -0.515014 |
| P8_TTTGGTTCAATCGGTT-1 | -2.056600 | -0.448924 |
| P8_TTTGGTTCACGGTAAG-1 | -2.253479 | -0.479393 |
| P8_TTTGGTTTCATAGCAC-1 | -3.258121 | -0.276560 |
715 rows × 2 columns
plot_response_signaling({'No dimension reduction': res})adata = rebuild_matrix(adata, method=['pca'], n_dim=15)
adataWARNING: adata.X seems to be already log-transformed.
AnnData object with n_obs × n_vars = 10799 × 17111
obs: 'UMAP_1', 'UMAP_2', 'Cluster', 'Celltype (malignancy)', 'Celltype (major-lineage)', 'Celltype (minor-lineage)', 'Patient', 'Sample', 'Tissue', 'celltype', 'n_genes'
var: 'n_cells'
uns: 'log1p', 'active_method', 'active_dim', 'pca'
obsm: 'pca'
varm: 'pca'
layers: 'counts'
response_data = compute_response(adata, gene_rate=grate, celltype="CD8T", signature="Tcell_activation")
signaling_data = compute_signaling(
adata,
model_file="data//signature.centroid.expand",
celltype='CD8T',
cytokine=None,
lambd=10000,
num_permutations=0,
test_method="two-sided"
)
res = pd.concat([response_data, signaling_data[["TGFB1"]]], axis=1)
res| Tcell_activation | TGFB1 | |
|---|---|---|
| P1_AACCGCGGTGATGTGG-1 | -0.656675 | 0.529193 |
| P1_AACGTTGGTGACTACT-1 | -2.900612 | 0.939177 |
| P1_AAGACCTGTCTTTCAT-1 | -0.554478 | -0.271388 |
| P1_ACTGTCCCACACGCTG-1 | -0.863269 | 0.583958 |
| P1_AGAGTGGCAAGCGTAG-1 | -2.330548 | 1.219573 |
| … | … | … |
| P8_TTTATGCTCGCCAAAT-1 | -3.068340 | 0.753479 |
| P8_TTTATGCTCTCAACTT-1 | -0.941963 | 0.271050 |
| P8_TTTGGTTCAATCGGTT-1 | -2.345652 | 0.996785 |
| P8_TTTGGTTCACGGTAAG-1 | -1.669373 | 0.566825 |
| P8_TTTGGTTTCATAGCAC-1 | -3.064377 | 1.090217 |
715 rows × 2 columns
plot_response_signaling({'PCA-based method': res})adata = rebuild_matrix(adata, method=['nmf'], n_dim=15)
adataWARNING: adata.X seems to be already log-transformed.
AnnData object with n_obs × n_vars = 10799 × 17111
obs: 'UMAP_1', 'UMAP_2', 'Cluster', 'Celltype (malignancy)', 'Celltype (major-lineage)', 'Celltype (minor-lineage)', 'Patient', 'Sample', 'Tissue', 'celltype', 'n_genes'
var: 'n_cells'
uns: 'log1p', 'active_method', 'active_dim', 'pca', 'nmf'
obsm: 'pca', 'nmf'
varm: 'pca', 'nmf'
layers: 'counts'
adata.uns.get('active_method', None)['nmf']
response_data = compute_response(adata, gene_rate=grate, celltype="CD8T", signature="Tcell_activation")
signaling_data = compute_signaling(
adata,
model_file="data/signature.centroid.expand",
celltype='CD8T',
cytokine=None,
lambd=10000,
num_permutations=0,
test_method="two-sided"
)
res = pd.concat([response_data, signaling_data[["TGFB1"]]], axis=1)
plot_response_signaling({'PCA-based method': res})adata.X = adata.layers['counts']
adata = rebuild_matrix(adata, method=['mca'], n_dim=15)
adataWARNING: adata.X seems to be already log-transformed.
run mca...
mca step 1: Construct the Fuzzy Matrix and Row Weights
mca step 1 completed: Fuzzy Matrix and Row Weights constructed
svd started
svd completed
mca step 2: Calculate Cell Coordinates and Feature Loadings
mca step 2 completed: Cell Coordinates and Feature Loadings calculated
AnnData object with n_obs × n_vars = 10799 × 17111
obs: 'UMAP_1', 'UMAP_2', 'Cluster', 'Celltype (malignancy)', 'Celltype (major-lineage)', 'Celltype (minor-lineage)', 'Patient', 'Sample', 'Tissue', 'celltype', 'n_genes'
var: 'n_cells'
uns: 'log1p', 'active_method', 'active_dim', 'pca', 'nmf', 'mca'
obsm: 'pca', 'nmf', 'mca'
varm: 'pca', 'nmf', 'mca'
layers: 'counts'
response_data = compute_response(adata, gene_rate=grate, celltype="CD8T", signature="Tcell_activation")
signaling_data = compute_signaling(
adata,
model_file="data/signature.centroid.expand",
celltype='CD8T',
cytokine=None,
lambd=10000,
num_permutations=0,
test_method="two-sided"
)
res = pd.concat([response_data, signaling_data[["TGFB1"]]], axis=1)
plot_response_signaling({'MCA-based method': res})As demonstrated above, dimension reduction using MCA significantly enhanced the performance of cytokine activity inference, yielding a Pearson correlation coefficient (PCC) of -0.53, compared to 0.10 when no dimension reduction was applied.
A conventional approach for pathway activity inference relies on gene set scoring. However, this method has a significant limitation: most gene sets only include pathway member genes while excluding downstream target genes, making them inadequate for accurately assessing signaling pathway activity. Recognizing this gap, the SEED2 database has systematically compiled molecular signatures across 16 distinct signaling pathways. These signatures capture the specific gene expression alterations that occur when the pathway of interest is perturbed, thereby providing a more comprehensive representation of pathway dynamics. As a result, these curated signatures can be directly implemented in FeaSc to generate robust and reliable inferences of signaling pathway activities, offering researchers a more accurate tool for pathway functional assessment in single-cell.
df = pd.read_csv('data/SPEED2_signaling.tsv', sep='\t', nrows=0)
result = df.columns.tolist()
print(result)['Estrogen', 'H2O2', 'Hippo', 'Hypoxia', 'IL-1', 'Insulin', 'JAK-STAT', 'MAPK', 'TLR', 'Notch', 'p53', 'PI3K', 'PPAR', 'RAR', 'TGFb', 'TNFa', 'Trail', 'VEGF', 'Wnt']
adata.uns.get('active_method', None)['mca']
signaling_data = compute_signaling(
adata,
model_file="data/SPEED2_signaling.tsv",
celltype=None,
cytokine=None,
lambd=10000,
num_permutations=0,
test_method="two-sided"
)res = pd.concat([signaling_data[["Wnt"]], adata.obs['celltype']], axis=1)
res| Wnt | celltype | |
|---|---|---|
| barcode | ||
| P1_AAATGCCAGACTAGAT-1 | 0.934152 | B |
| P1_AAGGTTCTCAACGGGA-1 | 0.489379 | B |
| P1_ACACCAAGTACCGGCT-1 | 0.615619 | B |
| P1_ACATCAGGTTTAAGCC-1 | 0.873929 | B |
| P1_ACTTGTTGTGGCGAAT-1 | 1.433766 | B |
| … | … | … |
| P8_TTAGGCAAGTACGATA-1 | 0.498144 | Mono/Macro |
| P8_TTCGGTCGTTCAGACT-1 | -0.210558 | Mono/Macro |
| P8_TTCTCAACAAGCGCTC-1 | 1.765080 | Mono/Macro |
| P8_TTCTCAAGTGTGACCC-1 | 1.612464 | Mono/Macro |
| P8_TTGACTTCATGGATGG-1 | 0.951047 | Mono/Macro |
10799 rows × 2 columns
median_order = res.groupby('celltype')['Wnt'].median().sort_values(ascending=False).index
colors = ['red'] * 3 + ['blue']*3
plt.figure(figsize=(5, 4))
sns.boxplot(data=res, x='celltype', y='Wnt', order=median_order, palette=colors)
plt.title('Wnt activity by Cell Type')
plt.xlabel('Cell Type')
plt.ylabel('Wnt activity')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()